ComPAS: Community Preserving Sampling for Streaming Graphs

نویسندگان

  • Sandipan Sikdar
  • Tanmoy Chakraborty
  • Soumya Sarkar
  • Niloy Ganguly
  • Animesh Mukherjee
چکیده

In the era of big data, graph sampling is indispensable in many settings. Existing sampling methods are mostly designed for static graphs, and aim to preserve basic structural properties of the original graph (such as degree distribution, clustering coefficient etc.) in the sample. We argue that for any sampling method it is impossible to produce an universal representative sample which can preserve all the properties of the original graph; rather sampling should be application specific (such as preserving hubs needed for information diffusion). Here we consider community detection as an application scenario. We propose ComPAS, a novel sampling strategy that unlike previous methods, is not only designed for streaming graphs (which is a more realistic representation of a real-world scenario) but also preserves the community structure of the original graph in the sample. Empirical results on both synthetic and different real-world graphs show that ComPAS is the best to preserve the underlying community structure with average performance reaching 73.2% of the most informed algorithm for static graphs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Network Sampling: From Static to Streaming Graphs

Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network sci...

متن کامل

Approximate Integration of streaming data

We approximate analytic queries on streaming data with a weighted reservoir sampling. For a stream of tuples of a Datawarehouse we show how to approximate some Olap queries. For a stream of graph edges from a Social Network, we approximate the communities as the large connected components of the edges in the reservoir. We show that for a model of random graphs which follow a power law degree di...

متن کامل

Can Sampling Preserve Application Adoption Process over OSN Graphs?

Can Sampling Preserve Application Adoption Process over OSN Graphs? Mohammad Rezaur Rahman, Chen-Nee Chuah {mrrahman, chuah}@ucdavis.edu Abstract Online social network (OSN)-based applications often rely on user interactions to propagate information or to recruit more users. Understanding the adoption or cascade process of an idea, a product, or a new application over OSN graph is of great inte...

متن کامل

Sublinear Algorithms for MAXCUT and Correlation Clustering

We study sublinear algorithms for two fundamental graph problems, MAXCUT and correlation clustering. Our focus is on constructing core-sets as well as developing streaming algorithms for these problems. Constant space algorithms are known for dense graphs for these problems, while Ω(n) lower bounds exist (in the streaming setting) for sparse graphs. Our goal in this paper is to bridge the gap b...

متن کامل

A Hybrid Sampling Scheme for Triangle Counting

We study the problem of estimating the number of triangles in a graph stream. No streaming algorithm can get sublinear space on all graphs, so methods in this area bound the space in terms of parameters of the input graph such as the maximum number of triangles sharing a single edge. We give a sampling algorithm that is additionally parameterized by the maximum number of triangles sharing a sin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.01614  شماره 

صفحات  -

تاریخ انتشار 2018